Spectral maxima representation for robust automatic speech recognition
نویسندگان
چکیده
In the context of automatic speech recognition, the popular Mel Frequency Cepstral Coefficients(MFCC) as features, though perform very well under clean and matched environments, are observed to fail in mismatched conditions.The spectral maxima are often observed to preserve their locations and energies under noisy environments, but are not presented explicitly by the MFCC features. This paper presents a framework for representing the maxima information for robust recognition in the presence of additive White Gaussian Noise(WGN). For the task of phoneme based Isolated Word Recognition (IWR) under different Signal to Noise Ratio (SNR) environments, the results show an improved recognition performance. The cepstral features are computed from a reconstructed spectrogram by fitting gaussians around the spectral maxima. In view of the inherent robustness and easy trackability of the maxima, this opens up interesting avenues towards a robust feature representation as well as preprocessing techniques.
منابع مشابه
Improving the performance of MFCC for Persian robust speech recognition
The Mel Frequency cepstral coefficients are the most widely used feature in speech recognition but they are very sensitive to noise. In this paper to achieve a satisfactorily performance in Automatic Speech Recognition (ASR) applications we introduce a noise robust new set of MFCC vector estimated through following steps. First, spectral mean normalization is a pre-processing which applies to t...
متن کاملRobust Noise Estimation Applied to Different Speech Estimators
In this paper we present a robust noise estimation for speech enhancement algorithms. The robust noise estimation based on a modified minima controlled recursive averaging noise estimator was applied to different speech estimators. The investigated speech estimators were spectral substraction (SS), log spectral amplitude speech estimator (LSA) and optimally modified log spectral amplitude estim...
متن کاملA Database for Automatic Persian Speech Emotion Recognition: Collection, Processing and Evaluation
Abstract Recent developments in robotics automation have motivated researchers to improve the efficiency of interactive systems by making a natural man-machine interaction. Since speech is the most popular method of communication, recognizing human emotions from speech signal becomes a challenging research topic known as Speech Emotion Recognition (SER). In this study, we propose a Persian em...
متن کاملGradient Based Spectral Peak Location for Noise Robust Speech Recognition
In this paper a gradient-based algorithm for finding spectral peak locations is presented. The algorithm makes use of gradient and acceleration locations in the spectrogram for locating the peaks. Use of frequency gradients and accelerations locate peaks. The results are then interpolated to yield a smooth peak envelope. The method is evaluated in the aurora framework. A first pass locates all ...
متن کاملRecognizing the message and the messenger: biomimetic spectral analysis for robust speech and speaker recognition
Humans are quite adept at communicating in presence of noise. However most speech processing systems, like automatic speech and speaker recognition systems, suffer from a significant drop in performance when speech signals are corrupted with unseen background distortions. The proposed work explores the use of a biologically-motivated multi-resolution spectral analysis for speech representation....
متن کامل